-
Notifications
You must be signed in to change notification settings - Fork 67
Decoder-native resize public implementation #1003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| dimension_order: Literal["NCHW", "NHWC"] = "NCHW", | ||
| num_ffmpeg_threads: int = 1, | ||
| device: Optional[Union[str, torch_device]] = "cpu", | ||
| transforms: List[Any] = [], # TRANSFORMS TODO: what is the user-facing type? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion point 1: If we accept TorchVision transforms, and we want to lazily load TorchVision, what type do we advertise here? We can easily explain that we accept a TorchVision transform in the docs, but what should we put in the type annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably be either Any or nn.Module, which is the base class of all torchvision v2 transforms, and something users are familiar with since this is the core building block of any pytorch model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that solves the problem nicely: it can definitely be nn.Module.
| show_error_codes = True | ||
| pretty = True | ||
| allow_redefinition = True | ||
| follow_untyped_imports = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting linting errors like: https://github.com/meta-pytorch/torchcodec/actions/runs/19157614790/job/54761644331
Which points to docs which recommend the above change: https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Not ready for merging or final review. This is a demonstration of accepting both TorchVision transforms and defining complementary specifications in TorchCodec. The purpose of this PR, right now, is to get agreement on if this is the right approach. If we can agree, I'll polish everything and write more tests.
For context, in #526, I had initially proposed not using TorchVision transforms, and instead coming up with TorchCodec specific versions. @NicolasHug proposed that we accept TorchVision transforms, and that's what I followed up with in my design in #885.
After discussing the previous iteration of this PR, we agreed we wanted to see what it would look like to accept both. Having implemented this, I agree it's the right thing to do:
torchcodec.transformsclass is a clear representation in code of what is supported. In the old PR, that mapping was buried in the logic that turned the TorchVision transform directly into the specification string the core API needs.I can easily be convinced to use
torchcodec.transforms.Transforminstead oftorchcodec.transforms.DecoderNativeTransform. My motivation there is to make it more likely users will not confuse these with TorchVision's transforms. We also might be able to think of a better user-facing name for this feature, and these transforms, than "decoder native." Some concrete questions:Transform,DecoderNativeTransformor some other name for the base class name?torchcodec.transformsortorchcodec.decoder_native_transformsor something else? Putting it in the module path may make it easier to just useTransform. And, for that matter,Resize.